权重和激活的量化是减少深神经网络(DNN)训练的计算占地面积的主要方法之一。当前方法使得4位量化的前向阶段。但是,这仅构成了培训过程的三分之一。减少整个训练过程的计算占地面积需要定量神经梯度,即相对于中间神经层的输出的损耗梯度。在这项工作中,我们研究了在量化神经网络训练中具有无偏差值的重要性,以及如何维护它,以及如何。基于此,我们建议一个$ \ texit {logarithic unbiased量化} $(luq)方法,以将前向和向后阶段量化为4位,实现最先进的导致4位训练,没有开销。例如,在Imagenet的Reset50中,我们实现了1.18%的降级。我们进一步改善了这一点以降解仅在高精度微调的单一时期与差异减少方法结合后的单一时期 - 均增加与先前建议的方法相当的开销。最后,我们建议使用低精度格式的方法来避免在训练过程的三分之二期间乘法,从而减少乘法器使用的5倍。
translated by 谷歌翻译
背景:洪水是世界上最常见的自然灾害,影响数亿岁的生活。因此,洪水预测是一项重要的重要努力,通常使用物理水流模拟实现,依赖于准确的地形升降映射。然而,这种基于求解部分微分方程的这种模拟是在大规模上计算上的禁止。这种可扩展性问题通常使用高程地图的粗网格表示,尽管这种表示可能扭曲了至关重要的地形细节,导致模拟中的显着不准确。贡献:我们训练一个深度神经网络,以执行地形地图的物理信息信息:我们优化地形地图的粗网格表示,以便洪水预测将匹配细网解决方案。对于成功的学习过程,我们专门为此任务配置数据集。我们证明,通过这种方法,可以实现计算成本的显着降低,同时保持准确的解决方案。参考实施伴随着该文件以及数据集再现的文档和代码。
translated by 谷歌翻译
背景。通常,深度神经网络(DNN)概括了从类似于训练集的分布的样本概括。然而,当测试样本从不同的分布中抽出时,DNNS的预测是脆性和不可靠的。这是在现实世界应用中部署的主要关注点,这种行为可能以相当大的成本,例如工业生产线,自治车辆或医疗保健应用。贡献。我们将DNN中的分布(OOD)检测出来作为统计假设检测问题。在我们所提出的框架内产生的测试将证据组合来自整个网络。与以前的检测启发式不同,此框架返回每个测试样本的$ p $ -value。有保证维护I型错误(T1E - 错误地识别OOD样本为ID)进行测试数据。此外,这允许在保持T1E的同时组合多个检测器。在此框架上建立,我们建议一种基于低阶统计数据的新型程序。我们的方法在不接受的EOD基准上的最新方法实现了比较或更好的结果,而无需再培训网络参数或假设测试分配的现有知识 - 并且以计算成本的一小部分。
translated by 谷歌翻译
A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach [2018], we show how the scale of the initialization controls the transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a simple two-layer model that already exhibits an interesting and meaningful transition between the kernel and rich regimes, and we demonstrate the transition for more complex matrix factorization models and multilayer non-linear networks.
translated by 谷歌翻译
我们检查了在未注册的逻辑回归问题上的梯度下降,并在线性可分离数据集上具有均匀的线性预测指标。我们显示了预测变量收敛到最大边缘(硬边缘SVM)解决方案的方向。结果还推广到其他单调的损失函数,在无穷大时降低了损失功能,多级问题,并在某个受限的环境中训练在深网中的重量层。此外,我们表明这种融合非常慢,只有在损失本身的融合中的对数。这可以有助于解释即使训练错误为零,并且训练损失非常小,并且正如我们所显示的,即使验证损失增加了,也可以继续优化逻辑或跨透明度损失的好处。我们的方法还可以帮助理解隐式正则化n更复杂的模型以及其他优化方法。
translated by 谷歌翻译
In this work we introduce a binarized deep neural network (BDNN) model. BDNNs are trained using a novel binarized back propagation algorithm (BBP), which uses binary weights and binary neurons during the forward and backward propagation, while retaining precision of the stored weights in which gradients are accumulated. At test phase, BDNNs are fully binarized and can be implemented in hardware with low circuit complexity. The proposed binarized networks can be implemented using binary convolutions and proxy matrix multiplications with only standard binary XNOR and population count (popcount) operations. BBP is expected to reduce energy consumption by at least two orders of magnitude when compared to the hardware implementation of existing training algorithms. We obtained near state-of-the-art results with BDNNs on the permutation-invariant MNIST, CIFAR-10 and SVHN datasets.
translated by 谷歌翻译
The performance of inertial navigation systems is largely dependent on the stable flow of external measurements and information to guarantee continuous filter updates and bind the inertial solution drift. Platforms in different operational environments may be prevented at some point from receiving external measurements, thus exposing their navigation solution to drift. Over the years, a wide variety of works have been proposed to overcome this shortcoming, by exploiting knowledge of the system current conditions and turning it into an applicable source of information to update the navigation filter. This paper aims to provide an extensive survey of information aided navigation, broadly classified into direct, indirect, and model aiding. Each approach is described by the notable works that implemented its concept, use cases, relevant state updates, and their corresponding measurement models. By matching the appropriate constraint to a given scenario, one will be able to improve the navigation solution accuracy, compensate for the lost information, and uncover certain internal states, that would otherwise remain unobservable.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译
In the present work we propose an unsupervised ensemble method consisting of oblique trees that can address the task of auto-encoding, namely Oblique Forest AutoEncoders (briefly OF-AE). Our method is a natural extension of the eForest encoder introduced in [1]. More precisely, by employing oblique splits consisting in multivariate linear combination of features instead of the axis-parallel ones, we will devise an auto-encoder method through the computation of a sparse solution of a set of linear inequalities consisting of feature values constraints. The code for reproducing our results is available at https://github.com/CDAlecsa/Oblique-Forest-AutoEncoders.
translated by 谷歌翻译
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译